fast-interp: legacy exception handling (try/catch/rethrow/delegate/tag) by matthargett · Pull Request #4949 · bytecodealliance/wasm-micro-runtime

matthargett · 2026-05-21T19:42:22Z

Lifts the cmake guard at build-scripts/unsupported_combination.cmake:67 that forbids
WAMR_BUILD_EXCE_HANDLING=1 WAMR_BUILD_FAST_INTERP=1 and adds the matching dispatch
loop coverage in wasm_interp_fast.c — loader-side EH metadata table, runtime
EH-frame stack, catch-walk for throw / rethrow, delegate forward-to-outer,
tag-with-params payload routing, and result-typed try-region COPY-at-CATCH
alignment. I tried to keep each step bisectable.

Why we built this: we're replacing WasmEdge with WAMR fast-interp as the wasm
runtime in a pure-interpreter App-Store-eligible app, and a migration
blocker is graphql-validation compiled by Porffor — JS-to-wasm output that lowers
try/catch/throw to the wasm-exceptions section. Without EH enabled, fast-interp
rejects the binary at load with invalid section id; with EXCE_HANDLING +
CLASSIC_INTERP it loads but fast-interp is 1.3–1.8× faster on every benchmark we
ran.

Cross-microarch benchmarks: M4 Lion P / M4 Sawtooth E / A14 Icestorm (iPhone 12) / A12 Tempest (iPhone XS) /
S8 (Watch SE2) at
https://github.com/rebeckerspecialties/wasm-benchmark/blob/claude/relaxed-simd-diff-fuzz/README.md#cross-runtime-results-across-apple-silicon-e-cores
. Integration tests in our benchmark repo include a Porffor-compiled
graphql-validation workload that mirrors the real-world try { visit(…) } catch (e) { if (e !== abortObj) throw e; } shape and exercises every EH opcode the loader
emits. ASan + UBSan builds are part of the local dev loop.

Companion PR: relaxed-SIMD fast-interp opcode lowering, posted separately
(f32x4.relaxed_madd etc).

Validated existing benchmarks perform nearly exactly the same in terms of wallclock, throughput, cache, and branch predictor using CPU bottlneck template in xctrace.

…terp Enables WAMR_BUILD_EXCE_HANDLING=1 together with FAST_INTERP=1 for the *throw-only* subset of the legacy wasm-eh proposal — modules that declare tags and execute `throw`/`rethrow` but never define a same- function `try`/`catch` handler. The throw escapes via the existing `got_exception` bailout path, exactly like any other trap, and the host sees the exception via `wasm_runtime_get_exception`. This is the shape produced in the wild by Porffor (the JS-to-wasm compiler used by Fastly's StarlingMonkey): its graphql-validation benchmark we measure cross-runtime contains 561 `throw` opcodes and zero in-wasm try/catch handlers. Every JS throw escapes to the host JS engine, which is the typical Porffor / static-JS-to-wasm pattern. Three changes: * `build-scripts/unsupported_combination.cmake` — lift the EXCE_HANDLING + FAST_INTERP ban (with a comment explaining the scope: throw-only is supported, in-function try/catch is the natural follow-up). * `core/iwasm/interpreter/wasm_loader.c` — when fast-interp parses WASM_OP_THROW, emit the tag index as a uint32 immediate after the auto-emitted THROW opcode. Same shape as how WASM_OP_CALL emits its funcidx. * `core/iwasm/interpreter/wasm_interp_fast.c` — `HANDLE_OP(WASM_OP _THROW)` now reads the uint32 immediate, surfaces a tag-bearing exception via `wasm_set_exception`, and falls through to `got_exception`. The other legacy-EH ops (TRY / CATCH / CATCH_ALL / RETHROW / DELEGATE / EXT_OP_TRY) keep the existing "unsupported opcode" diagnostic — they're unreachable for fast-interp-compiled code today (the loader's fast-interp path treats TRY as a plain block via skip_label and never emits CATCH-family opcodes into the IR), so the diagnostic only fires if a future loader change starts emitting them. Validated end-to-end on aarch64-apple-darwin: a benchmark-core harness loads Porffor's graphql-validation-porf.wasm, runs `m()` (the export that drives the validation pipeline), and gets `result=0` — matching the cross-runtime consensus from wasmtime / WasmEdge interpreter. Before this PR the same workload failed at LOAD with "invalid section id" (the tag section couldn't be parsed without EXCE_HANDLING=1). Full same-function try/catch lowering — porting the classic interpreter's `find_a_catch_handler` design to fast-interp's slot- allocator + pre-decoded IR — is the natural follow-up.

Adds per-function `WASMFastEHEntry[]` (sized by the existing `func->exception_handler_count` field, allocated in pass 2 of the preprocess pass and freed in `wasm_loader_unload`) recording each try-region's catch handler pcs in the rewritten fast-interp IR. This is the data the upcoming runtime EH-frame stack will consult when a `throw` walks for a matching catch handler — it is *not yet used* in this commit. Three pieces of plumbing on the loader side: * `WASMFastEHCatch` / `WASMFastEHEntry` typedefs in `wasm.h`, plus a `WASMFunction.exception_handlers` field. The struct is gated on `WASM_ENABLE_EXCE_HANDLING && WASM_ENABLE_FAST_INTERP` so classic-interp builds are byte-identical. * `BranchBlock.eh_entry_idx` (loader-internal CSP slot) and `WASMLoaderContext.cur_eh_entry_idx` (the source-order cursor). These let CATCH / CATCH_ALL / DELEGATE / END handlers resolve back to the right try-region without walking the CSP at runtime — same pattern the existing fast-interp loader uses to pre-patch BR / BR_IF / BR_TABLE targets. * Pass-2-only populate logic on the existing CATCH, CATCH_ALL, DELEGATE, and END cases. The pass-1 increment of `exception_handler_count` is now gated on `loader_ctx->p_code_compiled == NULL` so it doesn't double- count when the loader re_scans for the second traverse. Runtime behavior is unchanged in this commit: CATCH / CATCH_ALL / RETHROW / DELEGATE still hit the "unsupported opcode" stub from the throw-only patch. The dispatch wiring lands in the next commit; this one establishes the data layout reviewers will sanity-check first. Cost-model note: no changes to any hot-op handler (CALL, LOAD, STORE) and the new struct fields are entirely behind the existing WASM_ENABLE_EXCE_HANDLING guard, matching classic-interp's posture where EH-on builds carry one byte store per PUSH_CSP and a small per-frame allocation but leave hot ops untaxed.

Wires up the per-frame eh-stack that commit 1 laid the metadata for. A program can now enter and exit a try-region without aborting; same- function throw → catch dispatch still bails out via got_exception (follow-up commit hooks that up). Frame layout: one extra cell per try-region appended past the value stack in the existing frame->operand[] allocation, sized by cur_wasm_func->exception_handler_count. Functions without try blocks pay zero cells. WASMInterpFrame gains a `uint32 eh_count` (the eh- stack top), clustered next to the existing EH-gated exception_raised/tag_index fields — same cache line, cold path only. Hot-op invariants preserved: * No new instructions in HANDLE_OP(WASM_OP_CALL), HANDLE_OP(WASM_OP_*_LOAD_*), HANDLE_OP(WASM_OP_*_STORE_*). * Dispatch table size is unchanged (slots 0x06 = WASM_OP_TRY, 0x07 = WASM_OP_CATCH, 0x0b = WASM_OP_END, 0x19 = WASM_OP_CATCH_ALL just get new bodies — they previously fell through to the "unsupported opcode" stub). * eh_count writes/reads only happen on TRY/CATCH/CATCH_ALL/END, none of which are on the dispatch loop's hot path. Loader changes (wasm_loader.c): * WASM_OP_TRY no longer skip_labels; emits its `eh_idx:u32` immediate after the auto-emitted opcode byte so the runtime push handler can find the right exception_handlers[] entry. * WASM_OP_CATCH / CATCH_ALL emit the same `eh_idx:u32` immediate; the runtime handler reads it to find end_of_region_pc to branch to on normal-flow exit. * WASM_OP_END for try-regions keeps the END byte in the IR (with the patch-list rewind dance to make `br N`-targeted PATCH_END addresses land *on* the END byte so the pop runs for branches too, not just fall-through). Runtime handlers (wasm_interp_fast.c): * HANDLE_OP(WASM_OP_TRY) pushes eh_idx onto frame_lp[eh_offset + eh_count] and increments eh_count. * HANDLE_OP(WASM_OP_CATCH) and HANDLE_OP(WASM_OP_CATCH_ALL) share a body: decrement eh_count, set frame_ip to func->exception_handlers[eh_idx].end_of_region_pc. * HANDLE_OP(WASM_OP_END) moves out of the "unsupported opcode" block when EXCE_HANDLING is enabled; decrements eh_count. * WASM_OP_RETHROW / WASM_OP_DELEGATE / EXT_OP_TRY still route to the diagnostic — wired up in a follow-up commit. After this commit: programs with try-regions where no throw fires inside the try body run correctly (the eh-stack is correctly maintained through entry/exit). Throws inside try bodies still escape via got_exception, matching the throw-only patch's behavior. porf-accurate still errors at the first throw escape (its catch handler does real work; full catch dispatch is the next commit).

Activates same-function and inter-function catch dispatch for the *void-result* try-region shape (which is what graphql-validation- porf-accurate emits — `06 40` = try-with-blocktype-void). Programs that throw inside a void try body now land in the matching catch handler (or catch_all) instead of escaping to the host trap path. The eh-stack push/pop infrastructure from the prior commit gives us the in-scope handlers; this commit adds the walk and the cross-frame unwind. Hot-op cost-model check: * HANDLE_OP(WASM_OP_THROW) is itself a cold op — programs that never throw never enter it. The walk runs in find_a_catch_ handler, also cold. * The one new check on a path every wasm-to-wasm call return visits is the `if (frame->exception_raised)` branch in return_func. Predicted strongly not-taken (exceptions are rare); two AArch64 instructions; identical in shape to classic-interp's existing check at wasm_interp_classic.c:6877. * The eh-stack cells share the cache line with the value stack they're allocated next to, so the walk hits warm memory. * CALL / LOAD / STORE handlers are byte-identical to the no-EH path. Mechanism: * `find_a_catch_handler` is a labeled block reached either by WASM_OP_THROW or by return_func when a callee stashed a tag on this frame. It walks frame->eh_count entries top-down, skipping entries whose top bit is set (state CATCH — already in an active handler; throws raised inside skip outward). On a tag match it ORs in EH_TRY_CATCH_STATE_BIT and dispatches frame_ip to entry->catches[j].handler_pc (or entry->catch_all_pc when no typed clause matches). * On exhaustion, the walker stashes exception_tag_index on prev_frame->tag_index, sets prev_frame->exception_raised = true, and goes to return_func. return_func, after RECOVER_CONTEXT has restored the caller's context, re-enters find_a_catch_handler with the caller's frame in scope. * At the top of the wasm stack (prev_frame->ip == NULL) the walker takes the existing got_exception escape so the host can read the trap message via wasm_runtime_get_exception. * frame->exception_raised and frame->tag_index are pre-existing fields originally added for classic-interp. exception_raised must now be cleared on every fast-interp frame setup — ALLOC_ FRAME doesn't zero-init the header and a stale non-zero byte trips the return_func check on every call return. Loader-side bug fix: the CATCH and CATCH_ALL emit_uint32(eh_idx) calls used to live inside the `if (loader_ctx->p_code_compiled != NULL)` populate guard. That gating skipped them in pass 1 but ran them in pass 2, so pass 2 wrote 4 bytes per catch *past* the code_compiled buffer allocated based on pass 1's measurement. The overrun corrupted whatever loader allocation the heap placed immediately after — typically func->exception_handlers itself (the first 4 bytes of entry[0], i.e. catch_count, was the usual victim). Surfaced as "wasm exception thrown (tag 0)" on `test_local_throw` where the typed-catch's catches[] array showed count=0 at runtime even though the loader populated count=1 in pass 2 — the populate itself wrote correctly, then a later opcode's reserve_block_ret overran the buffer and zeroed catch_count. Moved both emit_uint32 calls outside the populate guard so both passes account for the 4-byte immediate. State encoding: each eh-stack cell packs the loader's exception_handlers[] index in the low 31 bits and a state bit (EH_TRY_CATCH_STATE_BIT) in the top bit. No cell-count change vs the prior commit; same per-frame allocation footprint. Known limitation: try-regions with a non-void result-type are not yet supported by the *normal-flow* path. The fix is a loader-side try-body→block-dynamic-offset COPY emit at CATCH processing time (mirrors how WASM_OP_ELSE aligns the if-body's result via reserve_block_ret). See AGENTS.md's "Open follow-up — WAMR fast- interp legacy exception handling" section. graphql-validation-porf- accurate uses void-result try-blocks so it isn't blocked by this. Verified by `crates/benchmark-core/src/bin/probe_eh_void.rs` (5 cases — typed catch, catch_all, inter-function unwind, nested, no-throw — all PASS) and the existing run_graphql_validation_wamr regression (AS / porf-fast / porf-accurate within run-to-run variance vs the prior commit).

Activates the RETHROW opcode: re-raise the exception currently being handled by the (depth+1)-th `state=CATCH` entry from the top of the per-frame eh-stack. Source form `rethrow N` becomes `RETHROW <N:u32>` in the rewritten IR; the runtime walker scans the eh-stack top-down, skips state=TRY entries (they're not "catch handlers in progress"), and on the (depth+1)-th state=CATCH match reads its stashed caught tag and dispatches to `find_a_catch_handler` exactly as a fresh throw with that tag would. Storage shape: each eh-stack entry is now `EH_ENTRY_CELLS = 2` cells wide. Cell 0 packs `eh_idx | EH_TRY_CATCH_STATE_BIT` (unchanged); cell 1 holds the wasm tag index of the exception currently being handled on that entry (undefined while the entry is in TRY state — the throw walker writes it on catch dispatch). Frame allocation grows by `exception_handler_count * 2` cells per call; functions without try blocks still pay zero cells. Hot-op cost-model check: * No new code in HANDLE_OP(WASM_OP_CALL) / LOAD_* / STORE_*. * RETHROW is a cold op (only fires inside catch bodies); the walk runs across at most the number of catches nested around the rethrow site. * TRY's push gained a no-op write (cell 1 stays undefined until the throw walker overwrites it on dispatch) — same one indexed store as before, just with a wider stride. * `frame->exception_raised` init + the return_func hook are unchanged from the prior commit; no new branches on any return path. Loader-side land-mine cleared: WAMR's shared `check_branch_block` calls `emit_br_info` unconditionally, which for a typical arity-zero catch target writes 4 bytes (arity) + 8 bytes (target ptr placeholder via `add_label_patch_to_list`) into the IR between the auto-emitted opcode label and the next op. RETHROW doesn't *branch* to its target — it walks the eh-stack — so those br-info bytes are dead weight, and worse: they shift our depth immediate past where the runtime `read_uint32(frame_ip)` looks for it. The RETHROW case in the loader now does its own depth + label-type validation (manual `loader_ctx->frame_csp - depth - 1` lookup, LABEL_TYPE_CATCH/CATCH_ALL check) and skips check_branch_block entirely. Verified by three new cases in `crates/benchmark-core/tests/eh_correctness.rs`: - `rethrow_depth_zero`: inner catch sets a flag, `rethrow 0`, outer catch sees the same tag (= 11). - `rethrow_preserves_tag`: two tags ($a, $b); throw $b → inner catch $b → rethrow 0; outer catch $b wins over outer catch $a (= 11). - `rethrow_depth_one`: nested catches; from inside the innermost (which caught $b), `rethrow 1` re-raises the *outer* catch's tag ($a). All 23 cases in the EH correctness suite pass; AS / porf-fast / porf-accurate benchmark medians overlap the prior commit's range within run-to-run variance (three runs each).

Wires up the runtime + loader for `try ... delegate N` so the throw walker can re-raise the exception at the target block's location without spending hot-op budget. Loader (wasm_loader.c, WASM_OP_DELEGATE case): Skip the shared `check_branch_block_for_delegate` — its `emit_br_info` call would write 12 bytes of branch metadata between the auto-emitted DELEGATE label and the next op, dead weight at runtime and (worse) the same alignment-shift gotcha that bit RETHROW. Do the depth read + bounds check inline. In pass 2, count try/catch/catch_all blocks STRICTLY between the delegate's frame and the target block — that count (`delta`) is exactly how many eh-stack entries the runtime walker must skip past, by spec. Runtime (wasm_interp_fast.c): * find_a_catch_handler: before catch-matching, check `entry->delegate_target_depth`. If set, mark the delegate's own eh-stack entry consumed (STATE bit) and do `i -= delta; continue;` so the for-loop's natural i-- lands on the first eh-stack entry strictly outside the target block. The `delta + 1 >= i` guard catches "delegate to function block" (target lies outside this function's eh-stack) and falls through to the existing "no handler in this frame" return_func path. * WASM_OP_DELEGATE: split out of the "unsupported opcode" stub into its own normal-flow handler — fires when the try body completes without throwing; just `frame->eh_count--` and advance. Cost shape preserved: zero new bytes in CALL / LOAD / STORE; all delegate work lives on the cold throw walker or the cold normal- flow exit handler.

Wires up the loader + runtime path so a tagged exception with i32 / i64 / v128 parameters delivers its payload to the matching catch body's operand stack — same-function dispatch only. Cross-function dispatch (callee throws, caller catches) still drops the payload; that gap is now surfaced explicitly via the `cross_function_tag_with_params` integration test (#[ignore]'d with the same justification recorded in AGENTS.md). WASMFastEHCatch grows two fields: uint32 param_cell_num; int16 *param_dst_offsets; The dst-slots array is a loader-owned int16[] of length `param_cell_num`, capturing the cell-wise frame_lp slot offsets that the catch body's downstream ops will pop from. NULL for the common tag-without-params case (Porffor's empty-payload tags, all of the spec-test's `tag $err` declarations) — no heap allocation and the runtime walker's copy loop is a trivial zero-iteration no-op. Loader (wasm_loader.c) — CATCH case: * Swap `PUSH_TYPE` for `PUSH_OFFSET_TYPE` so the catch body's incoming params get fresh `dynamic_offset` slots allocated + emitted as int16 operands in the IR (right after the eh_idx immediate). The PUSH_OFFSET_TYPE emits are dead bytes on the normal-flow CATCH dispatch (which only reads eh_idx and branches to end_of_region_pc), but they're necessary so the catch body's POP_OFFSET_TYPEs find the right slot offsets in frame_offset[]. * Pass 2 captures handler_pc AFTER the PUSH_OFFSET_TYPEs so the throw walker's `frame_ip = handler_pc` lands at the first byte of the catch body proper (skipping the dead dst-slot bytes). * Pass 2 also bh_memcpy_s's frame_offset[]'s top `param_cell_num` cells into a fresh int16[] on the catch's WASMFastEHCatch — these are the destination offsets the runtime walker will write payload values to. * Free path in wasm_loader_unload extended to free the per-catch dst-offsets array. Loader — THROW case (wasm_loader.c): * Moved the existing `emit_uint32(tag_index)` below the tag-type lookup + validation so `tag_type->param_cell_num` is available. * After tag_index, emit `<param_cell_num:u32>` plus `<src_offset_i:int16>` for i in 0..param_cell_num. The src offsets are read directly off the top of `loader_ctx-> frame_offset[]` — the validation loop above pops frame_ref but doesn't touch frame_offset, so they're stable. Both traverses run the same emit to keep pass-1 / pass-2 size accounting balanced. Runtime (wasm_interp_fast.c) — new locals in the dispatch function (cold-path only, same scope as `exception_tag_index`): uint32 throw_param_cell_num = 0; int16 *throw_src_offsets = NULL; These get populated by HANDLE_OP(WASM_OP_THROW), which now reads tag_index + param_cell_num + the src-offsets array off the IR (advancing frame_ip past all three). The pair is consumed by find_a_catch_handler's catch-match dispatch: on a typed-catch match it does the cell-wise copy `frame_lp[dst[c]] = frame_lp[src[c]]`. catch_all dispatch explicitly drops the payload (per spec — catch_all binds no exception values). The copy loop is fully cold (only THROW reaches here); CALL / LOAD / STORE handlers untouched. WASM_OP_RETHROW: extended to re-point throw_src_offsets at the matched catch's `param_dst_offsets` before goto find_a_catch_ handler — so rethrow from inside a typed catch carries the same payload outward. The catch body can't mutate the dst slots (they're allocated from `dynamic_offset`, separate from the local-slot range that local.set writes to), so the values are still the original ones at rethrow time. Rethrow from inside a catch_all (whose `param_dst_offsets == NULL`) falls back to zero-cell — documented as a known limitation. return_func hook: the cross-frame branch zeros throw_param_cell_ num and throw_src_offsets before the goto find_a_catch_handler, since the callee's source slots live in a frame that's about to be torn down — same payload-dropping semantics as the existing cross-function-no-payload case, but explicit instead of relying on uninitialized stack. Cost shape preserved: zero new bytes in CALL / LOAD / STORE. EH_ENTRY_CELLS still 2; no extra cells per try-region. The two new locals get spilled by the compiler since the hot loop doesn't reference them.

Two bugs surfaced once same-function tag-with-params actually got exercised by integration tests: 1. **`PUSH_OFFSET_TYPE` is offset-only.** The CATCH loader was bumping `dynamic_offset` + `frame_offset[]` but never `stack_cell_num`, leaving the operand and ref stacks out of sync. The catch body's first consumer (e.g. `global.set $g`) then hit `wasm_loader_pop_frame_offset`'s polymorphic short-circuit — the CATCH block inherits the polymorphic flag from THROW's `SET_CUR_BLOCK_STACK_POLYMORPHIC_STATE` and with `available_stack_cell == 0` the pop silently returned without emitting the source-slot operand bytes. The consumer's runtime read then landed on heap garbage and crashed with SIGBUS / SIGSEGV. Fix: pair `PUSH_OFFSET_TYPE` with `PUSH_TYPE` (ref-only) so both stacks advance in lockstep. 2. **Multi-cell `frame_offset[]` entries are unreliable past the first cell.** `wasm_loader_push_frame_offset` writes a meaningful int16 only for the FIRST cell of a multi-cell value (i64, f64, v128); the subsequent cell entries are left uninitialized (just a pointer increment, no write). My pass-1 THROW src-offset emit and pass-2 CATCH dst-offset capture were reading those uninitialized cells directly, producing garbage offsets for any param wider than 32 bits. Fix: walk params (not cells) and synthesize consecutive cell offsets `(first, first+1, ..., first+N-1)` per param, where `first = frame_offset[cell_so_far]`. Matches the runtime invariant that an N-cell value occupies N consecutive frame_lp cells. 3 new integration tests cover the fixes: * `tag_single_i64_param` — 2-cell payload * `tag_mixed_i32_i64_params` — exercises per-param cell synthesis (would fail if cell-walk offset by 1) * `repeated_throw_with_payload` — confirms catch-allocated dst slots get fresh writes every invocation Plus a wat fix in `nested_try_with_params_inner_wins`: the outer catch's body was `i32.const 999 / global.set $g`, leaving the param on the operand stack at `end`. That was a latent bug masked before tag-with-params support (PUSH_TYPE-only didn't let the param "exist" for validation purposes). Now corrected by adding an explicit `drop` so the catch body's stack validates clean. No hot-op cost change: all the new loader work is in the cold CATCH / THROW preprocess paths, and the runtime walker copy loop is unchanged.

`try (result T)` regions now route the try body's normal-flow value into the block's `dynamic_offset` slot the same way `else` routes the if-body's value via `reserve_block_ret`. The throw- dispatch path's catch-body END already handled the catch's COPY via the existing reserve_block_ret call; this patch fills the remaining gap by injecting a COPY before each CATCH/CATCH_ALL label so the normal-flow exit (try body completes without throwing → falls through to CATCH → CATCH runtime handler jumps to end_of_region_pc) also deposits the value at the right slot. Loader (wasm_loader.c): * WASM_OP_CATCH and WASM_OP_CATCH_ALL: before the existing emit_uint32(eh_idx) emit, call `check_block_stack` on the previous body (the try body on the first CATCH; the prior catch body on subsequent ones) and emit an EXT_OP_COPY_STACK_TOP / _I64 / _V128 if the body's last cell isn't already at `cur_block->dynamic_offset`. The `src != dst` predicate runs in both passes; the sign-stable nature of dynamic_offset (≥ 0) vs const-pool slots (≤ -1) keeps pass-1 size accounting and pass-2 writes aligned even though const-pool slots get renumbered by the qsort/dedup at the start of pass 2. * Both cases now also `SET_CUR_BLOCK_STACK_POLYMORPHIC_STATE (false)` after `RESET_STACK()`, matching how `WASM_OP_ELSE` resets the if-body's polymorphic flag. Without this reset, a catch body following a throw inherits the polymorphic state and `check_block_stack` at END takes the polymorphic branch (`POP_OFFSET_TYPE` → 2 bytes per return-cell emitted). Those bytes land between the auto-emitted END label and the EH-END branch's `skip_label()`, shifting the re-emitted END label forward and leaving a corrupt handler-ptr at the recorded `handler_pc` — SIGSEGV on the first dispatch. Multi-return-value try-regions get an explicit "not yet supported" error; they need `EXT_OP_COPY_STACK_VALUES` emit support that's not in this commit. Single-return-value covers every shape Porffor / AS / our 51-case integration suite emits. 6 new result-typed integration tests (single i32 / i64, with and without throw, multi-catch picked by tag, catch_all fallback, mixed-with-locals slot allocation). Plus a wat fix in `multiple_catches_with_params_pick_by_tag`: the `catch $a` body left its param on the operand stack before the catch-to-catch transition. The previous loader didn't validate catch transitions, so this latent imbalance was silently accepted; now `check_block_stack` runs at every CATCH, catches the unbalanced stack, and reports the spec-required `type mismatch: block requires [] but stack has [i32]`. Added an explicit `drop` in the catch body so the test's wat validates clean. Verified end-to-end: 51/51 EH integration tests pass (was 45/45 before; +6 new result-typed cases). porf-accurate runs at 15.6 ms median (no regression vs the 17.3 ms baseline; small improvement plausibly from the polymorphic-reset path no longer emitting redundant POP_OFFSET_TYPE operands).

Adds a load-time warning when a br / br_if / br_table opcode crosses one or more LABEL_TYPE_TRY / _CATCH / _CATCH_ALL frames, because the runtime br doesn't pop the eh-stack — each crossed try-region leaks one eh-stack entry that survives until frame teardown. The simple case (single br out of a try; e.g. the `br_out_of_try_pops_eh_stack` integration test) is benign: the per-frame eh-stack reservation (`exception_handler_count * EH_ENTRY_CELLS` cells, covering every static try-block in the function) leaves room for one stale entry alongside any subsequent sibling try's push, and the top-down walker iterates from `eh_count` down so sibling-try throws still match the most recent push first. The stale entry dies when the frame is freed at function return. The pathological case — `loop { try { br_to_loop_top } catch }` — leaks one entry per iteration and eventually overflows the static reservation. `bh_assert(eh_count < exception_handler_ count)` would catch this, but `bh_assert` is a no-op in release builds (`BH_DEBUG` is unset there), so the out-of-bounds writes go through silently. The warning surfaces the shape in load-time diagnostics so a real embedder sees it before the hard-to-diagnose runtime corruption. `count_try_blocks_crossed(cur_block, target_block)` walks csp positions from cur_block down to target_block inclusive (target included because br to a non-LOOP target lands AFTER target's end, skipping it; LOOP targets aren't try-typed so the inclusive vs exclusive distinction doesn't change the count). The check fires only in pass 1 (`loader_ctx->p_code_compiled == NULL`) so each br site logs once even though wasm_loader_prepare_bytecode runs the bytecode twice. No hot-op cost — this is loader-time only. Verified: porf-accurate doesn't trigger the warning (no br-across-try patterns in the Porffor emit shape, consistent with the PMU profile showing zero hot-op overhead from EH). `br_out_of_try_pops_eh_stack` integration test triggers the warning once and still passes.

… checks Marks the four structurally-cold paths in WASM_OP_CALL_INDIRECT — out-of-bounds table index, uninitialized element, unknown function (post-table lookup), indirect-call type mismatch — with `__builtin_expect(cond, 0)`. Well-formed wasm modules pass all four on every dispatched CALL_INDIRECT; the hint lets the compiler: (a) provide a static-bias fallback for the branch predictor on unseen call sites (first-iteration impact only — Apple Silicon's predictor learns the bias dynamically after a few hits anyway); (b) lay out the error-handling tail away from the hot path so each pass-through case stays in straight-line I-cache. Measured on iPhone 12 (A14, Icestorm E-cores) with the graphql-validation workloads — bucket-share deltas are within run-to-run noise on both Porffor and AS, but the Porffor bottleneck is `Processing` (56.78%, backend / load-store saturation) not branch prediction (4.19% Discarded). AS's E-core shows the structural opportunity (27.22% Discarded) but that's the goto-indirect-branch in FETCH_OPCODE_AND_DISPATCH, not the direct branches inside CALL_INDIRECT. Kept as documentation-as-code: the cold-path semantic is real (spec-required traps that ~never fire on validated modules), and the compiler-time cost is zero. Full PMU writeup in out/eh-pmu-iphone12-2026-05-18.md (gitignored). No correctness change. No hot-op runtime cost. Doesn't affect EH code paths.

The legacy exception-handling spec test suite was previously hardcoded to skip every running mode except classic-interp: if [[ "${RUNNING_MODE}" != "classic-interp" ]]; then echo "support exception handling in classic-interp" return 0 fi Now that fast-interp supports the full legacy-EH proposal (TRY / CATCH / CATCH_ALL / RETHROW / DELEGATE / tag-with-params), the gate should allow both modes. This matches the parallel `ENABLE_GC` block a few lines down that already lists `classic-interp` AND `fast-interp` as acceptable. After this change, `./test_wamr.sh -t fast-interp -m exception-handling` runs the upstream WebAssembly spec EH suite against the fast interpreter — the same suite already validated against classic interp.

When a throw from a nested try is caught by an OUTER handler, the walker previously left the inner-try entries between the throw site and the matched outer entry on the eh-stack. The matched entry got its `EH_TRY_CATCH_STATE_BIT` set, but `frame->eh_count` stayed unchanged. After the outer catch body's END decremented eh_count by one, the inner-try slot remained at the top of the eh-stack with the matched outer entry now sitting *under* it (in-progress bit set). A subsequent throw inside (or after) the outer catch body would walk that stale state. The walker SKIPs entries with the state bit set, so the outer entry was correctly ignored — but the inner-try entry (no state bit) was treated as live. If the inner try's typed catch happened to match the new tag, the walker dispatched against that stale entry — an out-of-scope catch. Worse, in a tight loop of `outer try { inner try { throw } catch_other catch_outer { ... } }`, every iteration leaked one inner-try entry. After more iterations than the function's `exception_handler_count`, the next TRY push wrote past the static eh-stack reservation (silently in release builds since `bh_assert` is a no-op without `BH_DEBUG`). Fix: at each match-and-dispatch site in `find_a_catch_handler` — both the typed-catch branch and the catch_all branch — set `frame->eh_count = i;` before jumping to the handler. `i` is the loop counter, which equals the index of the matched entry plus one. This pops the nested-try entries above the match in a single indexed store. The matched entry stays at index i-1 with its state bit set; the catch body's END pops it normally when the body completes. Cost shape: one extra indexed store on the cold throw path, only when a typed catch or catch_all matches. CALL / LOAD / STORE handlers are untouched. Test added in the external integration suite at `crates/benchmark-core/tests/eh_correctness.rs:: outer_catch_unwinds_inner_eh_entries`. The test pattern is: outer try catches `$err`; inner try has a catch for `$err2`. Inner throw of `$err` is caught by outer. Outer catch body re-throws `$err2`, which must propagate UNCAUGHT (inner try is out of scope). Pre-fix walker found the stale inner catch and dispatched to it, producing a Ok(99) instead of the trap; post-fix the walker has no in-scope entries and the throw escapes correctly. Codex P1 review feedback on rebeckerspecialties/wasm-micro- runtime PR #2: "Unwind skipped EH entries before dispatching catches".

The walker's "no handler in this frame" path previously set `prev_frame->exception_raised = true` and let `return_func` forward the throw to the caller, regardless of payload size. This silently lost the payload: the source cells (`throw_src_offsets`) live in *this* frame's `frame_lp`, which return_func is about to tear down. The caller's `find_a_catch_handler` then ran with `throw_param_cell_num = 0`, which made any typed catch in the caller bind uninitialized destination slots — the catch body would either see garbage in its payload locals or, if the typed catch's slots were used as struct-of-pointers, dereference freed memory. Cross-function payload preservation would require a per-thread scratch buffer to ferry the payload across the frame boundary (callee's frame_lp → buffer → caller's frame_lp), plus a small change to return_func to populate it before tearing down the callee. That's a meaningful design lift and out of scope for this commit. Safe action for now: when a payload-bearing throw escapes its callee (i.e. `throw_param_cell_num > 0` and we're about to return to a caller frame), trap to the host with the diagnostic `"cross-function exception payload not supported by fast- interp"`. Same-function payload routing (the common Porffor / AS shape, where a JS throw is caught by an in-function catch the JS-to-wasm compiler emitted) is unaffected — that path dispatches via the same-function match in the walker before this branch runs. A `catch_all` in the caller would technically tolerate a zero-payload bind, but the typed-vs-catch_all choice happens in the caller's walker, which we can't peek into here without coupling the frames. Trap unconditionally for payload-bearing cross-frame throws. Tests: * `cross_function_tag_with_params` stays `#[ignore]` — that's the eventual-success-case for when cross-frame payload routing is implemented. * `cross_function_tag_with_params_traps` (new) asserts the current trap-with-expected-message contract on the same module shape. Codex P1 review feedback on rebeckerspecialties/wasm-benchmark PR #3 (patch 0007 line 306): "Preserve cross-frame exception payloads".

…egion When a br skips over a try-region's END, the runtime br doesn't pop eh-stack entries. For a one-shot br to a block / function-end / catch, the leaked entry is absorbed by the static `exception_handler_count * EH_ENTRY_CELLS` reservation and dies at frame teardown — a load-time `LOG_WARNING` surfaces the shape for embedders. If the br target is a LOOP entry, however, every iteration's TRY push adds one more entry to the eh-stack. After more iterations than the function's `exception_handler_count`, the next TRY push writes past the static reservation. `bh_assert(eh_count < count)` catches this in debug builds, but is a no-op without `BH_DEBUG` — release builds silently corrupt whatever sat past the reservation in the frame allocation. This commit changes that pathological shape from "log a warning and accept" to "fail load with an explicit error". The check sits next to the existing `count_try_blocks_crossed > 0` warning at all three branch sites (BR, BR_IF, BR_TABLE) and only fires when `frame_csp_tmp->label_type == LABEL_TYPE_LOOP`. The error message is identical at each site modulo opcode name: "br[_if|_table] to loop entry from inside try-region not supported in fast interpreter (would leak eh-stack entries per iteration)" Emitting a synthetic eh-stack pop at the br site would be the other fix and would let valid modules with this shape run, but it complicates the rewritten IR's br-info layout (the br dispatch currently emits a single uint32 depth; a pop-count immediate would need a per-target lookup) and the shape is rare in practice. Rejecting at load is the conservative, App-Store-safe choice — embedders see a deterministic error rather than silent memory corruption. Test added in the external integration suite: the previously- ignored `br_out_of_try_inside_loop` became `br_out_of_try_inside_loop_rejected`, which asserts the loader fails with the expected error string. Codex P1 review feedback on both PRs ("Reject branches that leak EH entries" / "Reject branches that leak EH stack entries").

Windows MSVC build of upstream PR bytecodealliance#4949 failed with `LNK2019: unresolved external symbol __builtin_expect` because `__builtin_expect` is a GCC/Clang builtin and MSVC has nothing equivalent. The branch-predictor hints are an optimization, not correctness, so the simplest portable fix is a no-op fallback gated on `!defined(__GNUC__) && !defined(__clang__)`. Lives at the top of `wasm_interp_fast.c` rather than in `bh_platform.h` to avoid touching the shared header for a local cold-path concern.

Upstream PR bytecodealliance/wasm-micro-runtime#4949 failed every `build_iwasm` matrix entry on Windows MSVC with `LNK2019: unresolved external symbol __builtin_expect referenced in function wasm_interp_call_func_bytecode`. The cold-path hints we added in patch 0011 use the GCC/Clang `__builtin_expect` intrinsic; MSVC has no equivalent. Drop-in no-op shim gated on `!defined(__GNUC__) && !defined(__clang__)`. The hints are branch-predictor optimization, not correctness, so dropping them on MSVC is fine. Same change is on the upstream PR branch as commit `0411662d` (separate fixup commit; lands in the PR sequence right after patch 0011's equivalent). Stack-position rationale: patch 0024 (after linmem 0023) inserts 9 lines near the top of `wasm_interp_fast.c` between the SIMDe include guards and `typedef int32 CellType_I32`. Putting it last in the apply-stack avoids shifting line-number anchors for any of the earlier patches.

matthargett · 2026-05-21T20:13:40Z

Update: pushed 0411662d — MSVC __builtin_expect no-op shim. The Windows MSVC matrix that was failing with LNK2019: unresolved external symbol __builtin_expect is now green.

Of the remaining single CI failure (build_regression_tests (ubuntu-22.04)), the four failing tests (BA issues 2702, 2833, 270801, 270802) all execute under mode: aot / runtime: iwasm-default and crash with exit code -4 (SIGILL) on the AOT binary. This PR doesn't touch AOT codegen or AOT runtime — only fast-interp. The same SIGILL-on-AOT pattern is visible on main's recent nightly_run (test (ubuntu-22.04, asan, aot, $WASI_TEST_OPTIONS) + tsan variant — both failure, both running AOT-compiled wasm). My read is this is an upstream-wide CI-infrastructure issue introduced around the LLVM 22 bump (PR #4937) — happy to be told otherwise if I'm misreading. Either way, nothing this PR can fix.

matthargett added 15 commits May 18, 2026 15:58

matthargett requested review from TianlongLiang, lum1n0us, no1wudi and yamt as code owners May 21, 2026 19:42

matthargett mentioned this pull request May 21, 2026

fast-interp: relaxed-SIMD opcode lowering #4950

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast-interp: legacy exception handling (try/catch/rethrow/delegate/tag)#4949

fast-interp: legacy exception handling (try/catch/rethrow/delegate/tag)#4949
matthargett wants to merge 16 commits into
bytecodealliance:mainfrom
rebeckerspecialties:feat/legacy-eh-fast-interp-full

matthargett commented May 21, 2026 •

edited

Loading

Uh oh!

matthargett commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

matthargett commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthargett commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

matthargett commented May 21, 2026 •

edited

Loading